2.2 Main report
2.2.1 Overview of PhD
I am on a 3.5-year GW4 BioMed MRC DTP PhD. I am in my third year and expect to finish April 2021. The year 1 report and presentation can be downloaded from GitHub.
2.2.1.1 Rationale
The number of individuals suffering from overweight and obesity is at an all time high. Globally, 39% and 13% of adults (18+) are estimated to be overweight or obese1(Figure 1 and 2) and this number is expected to continue to rise2–4. It is estimated that obesity is repsonsible for 8% of global deaths5(Figure 3). With the number of overweight and obese individuals increasing2–4 it is likely the number of premature deaths will rise too.
Figure 1: Proportion of overweight individuals
Figure 1, reproduced from Ritchie and Roser (2019)6, shows the share of adults (18+) that are overweight globally and in 5 selected geographic regions (Americas, Europe, Eastern Mediteranean, Africa and Sout East Asia) from 1975 to 2016.
Figure 2: Proportion of obese individuals
Figure 2, reproduced from Ritchie and Roser (2019)6, shows the share of adults (18+) that are obese globally and in 5 selected geographic regions (Americas, Europe, Eastern Mediteranean, Africa and Sout East Asia) from 1975 to 2016.
Figure 3: Number of deaths by risk factor
Figure 3, reproduced from Ritchie and Roser (2019)6, shows the number of deaths for 26 risk factors globally in 2017 for all age groups. Obesity is the 5th leading cause of death with 3.41 million deaths in 2017.
Conventionally, overweight and obesity is measured using body mass index (BMI), with overweight and obesity classified as a BMI of 25–29.9 kg/m2 and > 30 kg/m2 respectively. A normal weight classification is a BMI of 18.5–24.9 kg/m2, with an underweight class below this. In the crudest sense, BMI is a measure of weight given an adjustment of height. BMI is associated with numerous diseases and provides an accurate measure of risk at a population level for many. However, BMI does not have the resolution to accurately measure an individual’s body composition7–10 i.e. the amount and location of adipose tissue within the body - studies have pointed to a more important role for fat deposition in disease development11,12. As such, complimentary assesment of increased adiposity using a combination of body composition measures (i.e. BMI, waist hip ratio, body fat %) may provide additional information into associations with disease13,14.
Adipose tissues are prolific signallers to surrounding and systemic tissues15,16 leading to large downstream effects with potentially harmful consequences16–19. Changes to adipose tissue abundance is reflected in adipocyte signalling. This change is concurrent with shifts in metabolic profiles, where alterations to the level of one metabolite does not occur in isolation. Metabolites sit at the interface between genetic and non-genetic factors, provide a useful read-out of physiological function, and have been genetically well characterised20,21.
2.2.1.2 Aims
The biological pathway from increased adiposity to diseases development is unclear. Adipose tissue is a prolific signalling organ resulting in systemic changes across the body. Metabolic changes may be a result of increased adiposity and subsequent signalling and evidence has highlited the role of metabolites in disease. The aim of this thesis is to:
- Identify metabolites that sit on the causal pathway from increased adiposity to disease
2.2.1.3 Objectives
In order to achieve this aim and better understand the biological mechanisms underlying disease development this thesis will:
- Identify all traits causally associated with increased adiposity
- we will perform a systematic review of allMenderlin randomization studies investigating measures of increased adiposity with any outcome
- we will use the findings from this review to guide subsequent work
- Identify and describe appropriate instrumentation of increased adiposity
- Identify metabolites associated with increased adiposity
- We will use observational and Mendelian randomization analyses to identify metabolites associated with multiple measures of increased adiposity
- We will use observational and Mendelian randomization analyses to identify metabolites associated with multiple measures of increased adiposity
- Design and implement methods to cluster metabolites
- Metabolites are complicated and highly correlated; we will develop methods to cluster metabolites and propose rules for instrumenting metabolites and clusters for Mendelian randomization analyses
- Metabolites are complicated and highly correlated; we will develop methods to cluster metabolites and propose rules for instrumenting metabolites and clusters for Mendelian randomization analyses
- Identify diseases associated with metabolites
- We will use observational and Mendelian randomization analyses to identify metabolites associated with diseases
- We will be guided by the systematic review and metabolites we identify as associated with increased adiposity
2.2.1.4 Layout
Figure 4: Overview of PhD chapters
Figure 4 shows an overview of proposed chapters for the thesis, including progress to date and expected outcomes, in order to achieve the described aim and objectives. To enlarge: right click -> open image in new tab
2.2.2 Chapter progress
The thesis is laid out as a pipeline checklist of what to do when researchers want to understand the causal associations between exposures and outcomes using metabolites as intermediates. Chapter 1 introduces the context of the thesis and what we currently know about increased adiposity and diseases. The pipeline starts with chapter 2, identification of causally associated diseases, and progresses through choosing instruments for exposures, performing observational analysis and the first step of an MR, visulaising the reuslts of MR work on a large scale, instrumenting metabolites as intermediates and performing the final MR stage of intermediate to disease. The below Gantt chart lays out the plan for chapter progress in the coming 12 months (hover over a section to reveal the name):
2.2.2.1 Chapter 1: Introduction
2.2.2.1.1 Overview
This chapter provides the context of the thesis, i.e. what diseases increased adiposity is associated with. It provides background on adipose tissue and the products of adipose tissue such as metabolites. It gives an overview of observational research looking at diseases associated with increased adiposity. It goes on to explore what metabolites might provide in understanding these associations and how MR may help investiagte these associations. This chapter includes the aims and objectives of the thesis which are described above.
2.2.2.2 Chapter 2: Systematic review
2.2.2.2.1 Overview
Chapter 1 shows that the literature is clear that numerous diseases are associated with increased adiposity. However the causal associations between increased adiposity is not as clear. As MR has been increasingly used over the years and more datasets have become available a large body of evidence has built up for causal associations between increased adiposity and a number of diseases. Chapter 2 sets out to synthesise all of this evidence and identify the diseases causally associated with increased adiposity. These diseases will be used in the second step of the MR to identify whether metabolites are associated with the diseases. The systematic review will include a meta-analysis, however time constraints of the PhD may mean this is not completed within the time frame.
2.2.2.2.2 Progress
~150 papers were identified and included for data extraction. Data extraction is on-going and expected to be completed end of February with a draft manuscript/chapter for end March.
2.2.2.3 Chapter 3: Instrumentation
2.2.2.3.1 Overview
Before starting an analysis one must first identify the exposure. In MR analysis identifying the exposure includes deciding how to instrument the exposure. Traditionally this has been to select independent genetic variants reaching a genome-wide significance threshold (5 x 10-8) from the largest available GWAS. For increased adiposity measures, especially BMI, their are now many GWASs available for reserahcers to choose from. Chapter 3 explores how to instrument increased adiposity, including investigating the relationship between the exposures and the different GWASs available.
2.2.2.3.2 Progress
This will be a short chapter with a small amount of analysis showing the appropriateness of the instruments selected for the MR analysis. Some of the analysis has been conducted and some of the chapter has been written. An unoformatted draft can be viewed on GitHub.
2.2.2.4 Chapter 4: Observational analysis
2.2.2.4.1 Overview
Having established how to instrument increased adiposity in observational and MR analsysis in Chapter 3, this chapter explores the observational associations of increased adiposity and metabolites. The chapter will focus on confounders.
2.2.2.4.2 Progress
Not started. I now have access to UK Biobank data and we should have access to the metabolomics data in the coming months.
2.2.2.5 Chapter 5: MR step 1
2.2.2.5.1 Overview
This chapter is the first step of the MR process in identifying intermediate metabolites. The main analysis includes 3 exposures and 123 metabolites from Kettunen et al (2016)21 Additional sensitivity analysis of 17 other measures of adiposity and 4 methods has also been performed. The total number of tests performed is 9840. In addition to this I have performed the same analysis with an additional metabolite data set of 452 metabolites20 (3.616^{4} total tests) - I think this second analysis will probably not be included in the thesis as the metabolite GWAS is not as well powered or clean as the Kettunene metabolite GWAS.
2.2.2.5.2 Progress
I am 2/3 of the way through the manuscript which has been written as if i was writing the chapter. I need to finish the manuscript and transfer this into the chapter and then cut the manuscript down to form a publishable document. This project is my first in attempting to be completely reproducible with my code and is laid out in full on GitHub (currently private).
2.2.2.6 Chapter 6: MR Viz
2.2.2.6.1 Overview
Having performed a large MR analysis of 3 exposures and 123 outcomes (369 tests) as the main analysis, plus sensitivity analysis for 3 methods (1476) and an additional 17 measures of adiposity (2091) each with 3 additional methods (8364) the total number of tests is 9840. It is difficult to visualise and interpret all of this data. Given that we want to look at the global profile of metabolite changes as a result of increased adiposity we need to be able to visualise this data in an interperatebl manner. This chapter demonstrates a web application and R package developed to create Circos plots to visualise and interpret these types of MR analyses.
2.2.2.6.2 Progress
I am 2/3 of the way through the manuscript. The manuscript and the GitHub page will be adapted to form the chapter so 2/3 of the chapter is complete essentially. The web application is in a beta stage and is useable - I need to do some focus-group work with the group to make the website user friendly and incorporate any additional features/wording they think is needed. The R package is available on GitHub and is in the final stage with testing needing doing. All of this, including the manuscript, should be finished within the next two months, I just need to get some people together to play around with the app and R package to make sure it works and doesnt break.
Figure 5: Circos plot produced using R package from analysis performed in Chapter 5
2.2.2.7 Chapter 7: Clustering metabolites
2.2.2.7.1 Overview
Having identified metabolites associated with increased adiposity from visualising the global profile we need to decide how to instrument them in the second step of the MR analysis. There are in essence two ways to do this, either use each metabolite individually as one would normally or combine metabolites into a group that one then instruments. In chapter 9 I will use both individual metabolites and groups. In this chapter I will explore a number of different methods for clustering metabolites into groups that can be instrumented.
2.2.2.7.2 Progress
Not started. Current ideas for clustering methods to explore:
- Priors
- class
- subclass
- biological pathway
- size
- shared genetic variants
- No priors
- PCA
- factor analysis
- Hierarchical clustering
- density clustering
- self organising map
- LDSR
- ontology
- have discussed with Ben Elsworth - a pipeline is set-up that can be adpated to implement this and would be an interesting case study for their paper.
Once clustered the group of metabolites would then be treated as the group in an MR analysis and the genetic variants for each metabolite in the group would be used to instrument the group.
2.2.2.8 Chapter 8: Instrumentation
2.2.2.8.1 Overview
Having now explored the different methods for clustering the metabolites we need to establish the rules for how to then instrument these different types of clusters. This chapter will layout a set of rules for instrumenting metabolites in MR analysis. It will give use cases. This is a continuation of Chapter 7 and it feels liek a distinct chpater at the moment but when more work has been done it could be combined with Chapter 7.
2.2.2.8.2 Progress
Not started.
2.2.2.9 Chapter 9: MR step 2
2.2.2.9.1 Overview
Having now identified metabolites associated with increased adiposity, metabolite clusters, and how to isntrument clusters we can perform the second step of the MR investigating metabolite associations with diseases. It will be possible to run the MR against all diseases in MR Base and this will be presented in a searchable database online. We will only discuss diseases identified in the systematic review however.
2.2.2.9.2 Progress
Working with Ben Elsworth to categorise all MR Base GWASs into categories for easy subsetting to perform analysis on for example all anthropometric traits with all smoking traits. I have the ground work for the code completed for this analysis and will test using a few metabolites and a few diseases for which we know there is, isnt and are unsure of the association. The code is scalable so once Chapter7/8 are complete the code can be run and completed over a week.
2.2.2.10 Chapter 10: Discussion/limitations/conclusion
2.2.2.10.1 Overview
This chapter will tie everything together and present a diagram that outlines the pipeline for performing MR analysis of this type.
2.2.2.10.2 Progress
Not started….
2.2.3 Other
2.2.3.1 Courses
I have been on a number of courses this year (shown in figure). I plan on going on ………..
2.2.3.2 Conferenes/ presentations
I have presented my work at the following:
- Faculty of Health Sciences reserach showcase, presentation - Metabolite profiles as markers of risk
- Faculty of Health Sciences reserach showcase, poster - Metabolite profiling of multiple measures of adiposity: A Mendelian randomization analysis
- Metabolomics 2019, poster - Metabolite profiling of multiple measures of adiposity: A Mendelian randomization analysis
- MR conference 2019, poster - MR-Vis: A tool for the visualisation of high-dimensional Mendelian randomization results
2.2.3.3 Teaching
I have taught on the following:
- Mendelian randomization, Bristol Medical School short course
- Mendelian randomization conference MR course, conference workshop
- Introduction to R, Bristol Medical School short course
- Introduction to data visualisation and web applications using R, Bristol Medical School short course
- A one week course at the University of Pavia With Kaitlin: Causal Inference and Mendelian randomization, Department of Brain and Behavioural Sciences, University of Pavia, Italy
2.2.3.4 Public engagement
I’ve done much less than last year:
- Creative Reactions, lead - 50 artists and 50 reserachers with > 5,000 visitors
- Creative Reactions, participated - research turned into an artwork
- Talks - have given a number of talks to the public
- MRC IEU @ Greenman
- ~£20,000 in grants awarded in review period (including £14,985 from the EPSRC)
- Nic and I are writing an application to the Wellcome Trust to fund a public engagement project for the reserach group for ~£50,000
- I am working with a reserach fellow in Maths on a bid to Arts Council England for ~£50,000 - supported by Head of the School of Arts and Maths and the Population Health Science Institute
2.2.4 Other work
2.2.4.1 Placement
Funds have been requested (to extend the PhD by three months) to enable Matthew to work with Professor Ruth Loos at the Ichan School of Medicine at Mount Sinai, New York and undergo training in computational and data analytics to characterise the genetics of body composition, which will enable him to expand his current analysis investigating adiposity -> metabolites -> diseases to more accurate measures of body composition.
Matthew has investigated the effects of BMI and WHR but the lack of strong genetic characterisation for BF limits the understanding we can gain from these analyses. Matthew has also been using combinations of measures (profiles) to investigate the underlying biological mechanisms driving associations and is developing ways to investigate metabolites as profiles of risk. Combining the skills, expertise and data of our and Professor Loos’ groups will enable Matthew to expand his work into BF, and thus gain a better understanding of the biological mechanisms driving associations between increased adiposity and disease.
Matthew will work with Professor Loos, a world leader in human genetics and body composition, to explore BF genetics and increased adiposity profiles. Professor Loos is a member of the steering committee for the global BMI genome-wide association study (GWAS) consortia and set-up the global body composition GWAS consortia. The Loos Lab are an interdisciplinary team aiming to identify and characterize genes to better understand biological pathways. The team includes JJ Wang, Postdoctoral Fellow, has expertise in integrating genomics and metabolomics data; Arden Moscati, a Computational Geneticist, has expertise in etiological overlap between traits and diseases; Daiane Hemerich, Postdoctoral Fellow, has expertise in fine-mapping and functional annotation. Much of the labs work is conducted in BioMe, a biobank of ~50,000 ancestrally diverse (European (32%), African (24%), Hispanic/Latino (35%) and other/mixed ancestries (9%)) individuals.
Matthew will contribute to characterizing the genetics of BF using whole genome and exome sequence data. This will involve the use of fine-mapping, co-localisation and integrated approaches to investigate the genetics of multiple measures of body composition. This placement, discussed and agreed with Professor Loos, is a natural and valuable progression of Matthew’s work to develop our understanding of the mechanisms driving disease risk. With UK Biobank due to release metabolite data at a similar time, the placement will provide the appropriate data and skills to (i) identify genetic variants for MR investigations of metabolites and diseases and (ii) to integrate these genetic variants with the metabolomics work already completed and to be performed in UK Biobank.
2.2.4.2 GWAS of glycosuria
We conducted a genome-wide association study (GWAS) of glycosuria (sugar in urine) in pregnant mothers from the Avon Longitudinal Study of Parents and Children (ALSPAC). Due to a lack of available external data sources replication was not possible, instead we performed a GWAS in the Northern Finland Birth Cohort 1986 (NFBC1986) where we used mothers phenotype and the mothers’ offsprings genotype. To estimate the maternal effects from offspring genotypes we doubled the effect estimates and standard errors of the GWAS results22–24. The GitHub repository provides all scripts and data. We are currenlty making revisions to reviewer comments.
2.2.4.2.0.1 Abstract
Glycosuria is a condition where glucose is detected in urine at higher concentrations than normal. Glycosuria at some point during pregnancy has an estimated prevalence of 50% and is associated with adverse outcomes in both mothers and offspring. Little is currently known about the genetic contribution to this trait or the extent to which it overlaps with other seemingly related traits, e.g. diabetes. We performed a genome-wide association study (GWAS) for self-reported glycosuria in pregnant mothers from the Avon Longitudinal Study of Parents and Children (ALSPAC; cases/controls=1,249/5,140). We identified two loci, one of which (lead SNP=rs13337037; chromosome 16; odds ratio (OR) of glycosuria per effect allele: 1.42; 95%CI: 1.30,1.56; P=1.97x10-13) was then validated using an obstetric-measure of glycosuria measured in the same cohort (227/6,639). We performed a secondary GWAS in the 1986 Northern Finland Birth Cohort (NFBC1986; 747/2,991) using midwife-reported glycosuria and offspring genotype as a proxy for maternal genotype. The equivalent effect estimate for rs13337037 in this cohort was OR 1.57 (95% CI: 1.30,1.83; P=9.8x10-4). In follow-up analyses, we saw little evidence of shared genetic underpinnings with the exception of urinary albumin-to-creatinine ratio (Rg=0.64; SE=0.22; P=0.0042), a biomarker of kidney disease. In conclusion, we identified a genetic association with self-reported glycosuria during pregnancy, with the lead SNP located 15kB upstream of SLC5A2, a target of antidiabetic drugs. The lack of strong genetic correlation with seemingly related traits such as type 2 diabetes suggests different genetic risk factors exist for glycosuria during pregnancy.
Figure 6: Manhattan plot of a GWAS of glycosuria in ALSPAC mothers